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[Annotated Version] 
SUMMARY OF CLAIMED SUBJECT MAT TER funderlined in original! 
The present invention contains cto ima to a method for discovering patterns in a set of 
seq uences of symbols fclaims 3 ^- 42 and 44V to a computer-readable medium containing 
data structure useful bv a co m puter system in the practice of the method (claim 66), and to a 
com puter-readable medium containing instructio n s fo r controlling a computer system to 
perform the method ( claim 68). 

In ito method agpoot rccitod in indep e ndent 35, 12 and 11 gjLpf itff various aspects the 
present invention is directed to the identification of existing patterns in a set of "k" number of 
sequences. The k number of sequences is termed a "k~tuple". The k number of sequences 
form part of an overall set of "W number of sequences. Each of the w sequences has a given 
length, but the sequences need not be the same length. A "pattern" is a distributed substring 
of elements that occurs in at least two sequences in a set of sequences (Page 7, lines 36-38). 

The basic steps that comprise the core of the method of the present invention may be 
understood from the following discussion of a "two-tuple" (k = 2) of sequences Si and S 2 . 

Each member element of a sequence is represented by an alphabetic symbol. Page 7, 
lines 30-31 and 33-34 show the two representative sequences Si and S 2 . Each symbol 
occupies a given location in a sequence. This location is termed the symbol's "position 
index". The pairing of a symbol and its position index identifies a unique symbol at a unique 
location in a sequence. 

The first step of the method is to create for each sequence a table of ordered (symbol, 
position index) pairs. For instance, in the sequence Si the symbol "L" occurs at position 
indices 18 and 46 while the symbol "K" occurs at position indices 20, 25, 34 and 35. In the 
sequence & the symbol "L" occurs atposition indices 6, 23 and 30 and the symbol "K" occurs 
at position indices 8, 10, 14 and 32. The creation of this table of ordered pairs corresponds to 
step (a) of claim 68. 

The association of each symbol and its position index is used to form a "master offset 
table" for each sequence. Figure 1 shows two master offset tables for the "two-tuple" of 
sequences Si and Sj. Each master offset table groups, for each symbol, the position in the 
sequence occupied by each occurrence of that symbol. The master offset tables are the first 
date structures recited in claim 66 and are created by Step fb) of claim 68.. 

Thus, in the master offset table of Figure 1 for the sequence Si the position indices 
"1 8" and "46" are listed under the symbol "L" while position indices "20", "25", "34" and "35" 
are listed under the symbol "K". Similarly, for the sequence S 2 position indices "6", "23" and 
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"30" are listed for the symbol "L" and position indices "8", "10", "14" and "32" are listed for 
thesymbor'fc". 

Next, the difference-in-position between each occurrence of a symbol in one of the 
sequences and each occurrence of that same symbol in the other sequence is determined. This 
determination is facilitated by concatenating the two sequences. This is described at Page 10, 
line 15. A table, termed a "pattern map" (page 9, lines 32-33) or a "tuple-table" (page 30, 
Ikes line 25 through page 3 1, line 1 5), is formed in which each row in the table represents a 
single value of "difference in position" (page 9, line 20 through page 10, line 6). ThispaTfrm 
man is the "k-tunle table data structure" recited in c laim 66 and is produced by step (c) of 
claim 68. 

Figures 2A and 2B depict the pattern map for the two-tuple of sequences Si and S2. 
Since sequence Si contains 47 characters and the sequence S 2 contains 54 characters the 
partem map is m 101 rows in depth frnws numbered "0" through "100"^. For each given 
value of a difference-in-position (the value being termed the "row index") Ihe table lists the 
position of each symbol in the first sequence that appears again at a spacing corresponding to 
that difference-in-position value. The column o f row indices in the table of Figures 2A and 
2B is referred to as "the primary colu mn" in claim 66. 

Consider the symbol "R" listed in the master offset table for the sequence Si (at 
position index "44") and the position indices for the same symbol "R" as listed m the master 
offset table for the sequence S 2 (position indices "7", "21", "31"). From the master offset 
tables and the concatenation of the sequences Si and S 2 at page 7 it may be determined that 

- from the occurrence of the symbol "R" in the first sequence, 

-the first occurrence of the symbol "R" in the second sequence is 

spaced ten places; 

-the second occurrence of the symbol "R" in the second sequence is 
spaced twenty-four places; and 

-the third occurrence of the symbol "R" in the second sequence is 
spaced mirty-four places. 
The pattern map of Figures 2A and 2B thus lists the position index "44" (corresponding to the 
symbol "R") on row indices (difference-in-position values in the primar y column) of " 1 0", 
"24" and "34". 

The symbols collected for any row index (each value of difference-in-position) define 
a parent pattern in the first sequence that is repeated in the second sequence. 
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Consider the discussion at page 11, line 24 through page 12, line 30 for the row index 
value "35" ™ tb* primary column in the pattern map (or "two-tuple-table") of Figure 2A. This 
row index value identifies the pattern corresponding to the symbols at position indices "18", 
„ 2 q.. •« »30", "39" and "40". [The value "6" in the symbol count column to the immediate 
right of the colon on Figure 2A (page 10, line 37 through page 1 1, hne 2) indicates that there 
are six symbols in the pattern.] Figures 2A and 9.B thus show a sorted k-tuole table that is 
created hv row-snrting the k-tuple table bv th e position indices in the primary column, Xhjs 
sorted Tc-tu ple table is the "sorted k-tunle tabte date structure" recited in claim 66 and is 
included in step Cdl of claim 68. 

By consulting sequence S, the position indices "18", "20", "21", "30", "39" and "40" 
. respectively correspond to the symbols "L", "K", "V", "V", "P", "H". 

The collected symbols corresponding to a difference-in-position value "35" thus 
identifies the pattern occurring in the first sequence Si as: 

"L . KV V PH" 

that also appears in the second sequence Si (page 12, line 16), where the dots indicate 
placeholders in the pattern (page 12, lines 19-21). This is recited in step fe) of claim 68. 

Claim 66 io diroctod a computer readable medium containing a data structure useful by 
a oomputor n ybUiu i i n (l ie pu bl i c u o f tho m r fhnri it o p n d nnnrihnd n bn ye 

Cla i m 6 ° i: dir oo t o d to a fTfTmrr*™- "•"" 1n>>1 " mnH^im nonftiininfrinatmotiona for 
oontrolling a oomputor oyatom to discover one or more pattomo in two ooqu o ncos of symbols 
& i and Sa by p e rforating tho method stopo doscribod above. 

D od i b laima CC and C S oontain language uoing difforonoo in pooition valuoo ao tho 
e olccrion criteria for identifying repeating patterns of symbols. 
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